-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZooKeeper to KRaft migration #9324
Conversation
...rator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaAssemblyOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments from an initial pass ...
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Show resolved
Hide resolved
25e87c6
to
8edb152
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ppatierno, I left some comments, but looks good.
I also tried to run a migration after merging the Kafka Roller changes. It worked fine until KRaftDualWriting state, where I got this error when it was trying to roll the controllers.
2023-11-23 18:09:04 WARN KafkaQuorumCheck:65 - Reconciliation #173(timer) Kafka(test/my-cluster): Error determining the controller quorum leader id
org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support DESCRIBE_QUORUM
After manually restarting the controllers one by one, the migration completed successfully. I verified that broker were connected to the quorum and working.
NAME DESIRED KAFKA REPLICAS DESIRED ZK REPLICAS READY METADATA STATE WARNINGS
my-cluster 3 3
my-cluster 3 3 True ZooKeeper
my-cluster 3 3 True ZooKeeper
my-cluster 3 3
my-cluster 3 3
my-cluster 3 3 True KRaftMigration
my-cluster 3 3 True KRaftDualWriting
my-cluster 3 3 True KRaftDualWriting
my-cluster 3 3
my-cluster 3 3
my-cluster 3 3 True KRaft
I'm also attaching the CO log for this test.
zk2kraft.log
...r/src/main/java/io/strimzi/operator/cluster/operator/resource/KafkaMetadataStateManager.java
Show resolved
Hide resolved
...r/src/main/java/io/strimzi/operator/cluster/operator/resource/KafkaMetadataStateManager.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/io/strimzi/operator/cluster/operator/resource/KafkaMetadataStateManager.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Compute the next desired nodes configuration related state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Compute the next desired nodes configuration related state | |
* Compute the next state for the metadata configuration to be applied to Kafka nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this ... it's more the "node configuration" (so if ZooKeeper and/or KRaft are needed) than the "metadata configuration". I mean from the metadata state, we are getting which node configuration we need, and this is represented by a "state". Maybe the overall naming is bad :-D
@@ -286,7 +300,9 @@ public Future<Void> reconcile(KafkaStatus kafkaStatus, Clock clock) { | |||
// This has to run after all possible rolling updates which might move the pods to different nodes | |||
.compose(i -> nodePortExternalListenerStatus()) | |||
.compose(i -> addListenersToKafkaStatus(kafkaStatus)) | |||
.compose(i -> updateKafkaVersion(kafkaStatus)); | |||
.compose(i -> updateKafkaVersion(kafkaStatus)) | |||
.compose(i -> maybeKafkaMetadataMigrationInProgress()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use updateKafkaMetadataState, which also contains this logic. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that? Can you elaborate a little bit more please?
@fvaleri thanks for testing the two PRs working together!
So what you are seeing is expected and it's a problem for us. When it's the time to roll the controllers (so moving from the "dual write" mode to the full KRaft), the controllers (and the brokers as well) are still connected to ZooKeeper and in that state, the |
9a5e48d
to
9467ec4
Compare
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Outdated
Show resolved
Hide resolved
d35682a
to
9d7ecb4
Compare
3524cf3
to
634b897
Compare
a541feb
to
27b6736
Compare
9c59528
to
7da7fda
Compare
Signed-off-by: Paolo Patierno <[email protected]> Moved KafkaMetadataStateManager into the resource package Signed-off-by: Paolo Patierno <[email protected]> Added KafkaAgent endpoint to retrive ZooKeeper migration state metric Updated KafkaAgent client to retrieve ZooKeeper migration state Added logic to KafkaReconciler to check when migration ends Signed-off-by: Paolo Patierno <[email protected]> Fixed spotbugs issue Signed-off-by: Paolo Patierno <[email protected]> Moved KafkaMetadataConfigurationState in its own file Exposed metadata configuration state evaluation in corresponding KafkaMetadataConfigurationState instance methods Using loop on controllers to get the ZooKeeper migration metrics Other cleaning and refactor changes as per JS comments Signed-off-by: Paolo Patierno <[email protected]> Removed unused Random class Signed-off-by: Paolo Patierno <[email protected]> Moved the KafkaMetadataStateManager creation to the ReonciliationState because needed in the assembly operator as well Signed-off-by: Paolo Patierno <[email protected]> Fixed not exposing control plane port 9090 on KRaft broker-only nodes Fixed still exposing replication port 9091 on KRaft controllers during migration Signed-off-by: Paolo Patierno <[email protected]> Fixed duplicate KRaft configuration section when mixed-node Signed-off-by: Paolo Patierno <[email protected]> Fixed tests Signed-off-by: Paolo Patierno <[email protected]> Fixed bug about keeping replication listener on controllers until post-migration Signed-off-by: Paolo Patierno <[email protected]> Some refactoring (also based on FV comments) Signed-off-by: Paolo Patierno <[email protected]> Fixed validation for nodepools and features already in pre-migration Signed-off-by: Paolo Patierno <[email protected]> Addition on the configuration migration table Removed not used method Signed-off-by: Paolo Patierno <[email protected]> Fixed checkstyle issue Signed-off-by: Paolo Patierno <[email protected]> Running spec checker validation only when migration happens to KRaft not before Signed-off-by: Paolo Patierno <[email protected]> Fixed test Signed-off-by: Paolo Patierno <[email protected]> Updated liveness, readiness and run scripts to take process.roles into account Signed-off-by: Paolo Patierno <[email protected]> Added warning conditions when strimzi.io/kraft annotation is not used in the right way Signed-off-by: Paolo Patierno <[email protected]> Refactored the KafkaAgent and client in relation to the KRaft migration endpoint Signed-off-by: Paolo Patierno <[email protected]> Added versions validation before starting a KRaft migration Signed-off-by: Paolo Patierno <[email protected]> Removed the useKRaft flag related to the feature gate flags Making reconciliation failing fast when useKRaft feature flag is not enabled Updated related tests Signed-off-by: Paolo Patierno <[email protected]>
7da7fda
to
44a5906
Compare
Closing this PR which is now replaced by #9480 having same content but originating from the |
Type of change
Description
This PR fixes #9433 about implementing the ZooKeeper to KRaft migration process.
The current status is about starting from a ZooKeeper-based cluster, creating a node pool with controllers, applying the
strimzi.io/kraft: migration
annotation on theKafka
CR and allowing the operator FSM to start the migration up to the "dual-write" (so creating controllers, rolling brokers, start migration). Then the user can apply thestrimzi.io/kraft: enabled
annotation in order to finilize the process by having the full KRaft-based cluster,Different things are still missing:
setting some warnings in theKafka
CR status when annotation is applied in non valid states;checking if readiness/liveness scripts need to evaluate the node role other than the migration state. Maybe it won't be fixed until the new KRaft readiness/liveness will land into the codebase;checking the values for KAFKA_READY and ZK_CONNECTED env var in the Kafka run script;theuseKRaft
flag passed around to theKafkaCluster
is now considered "just" the value of the feature gate (it doesn't take into accountstrimzi.io/kraft: enabled
annotation anymore, because now KRaft can be early enabled during migration). Maybe I should use anyway this flag together with all other checks? Not sure ...logic in the Kafka agent able to get metrics about migration allowing the FSM to know when migration ends and move on to the next phase (which should already work because it would be just a matter of changing thestrimzi.io/kraft
annotation toenabled
and allowing the operator to update nodes configuration and rolling them). NOTE: rolling controllers is a blocker due to Add support for KRaft in KafkaRoller #9146)The PR won't address documentation. There will be a different PR for it.
Checklist
Please go through this checklist and make sure all applicable tasks have been done